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A simple, efficient, and accurate method is proposed to map multi-dimensional free energy land- 
scapes. The method combines the temperature-accelerated molecular dynamics (TAMD) proposed 
in [Maragliano & Vanden-Eijnden, Chem. Phys. Lett. 426, 168 (2006)] with a variational recon- 
struction method using radial-basis functions for the representation of the free energy. TAMD is 
used to rapidly sweep through the important regions of the free energy landscape and compute the 
gradient of the free energy locally at points in these regions. The variational method is then used to 
reconstruct the free energy globally from the mean force at these points. The algorithmic aspects 
of the single-sweep method are explained in detail, and the method is tested on simple examples, 
compared to metadynamics, and finally used to compute the free energy of the solvated alanine 
dipeptide in two and four dihedral angles. 



I. INTRODUCTION 



The free energy (or potential of mean force) is the 
thermodynamic force driving structural processes such 
as conformational changes of macromolecules in aque- 
ous solution, ligand binding at the active site of an en- 
zyme, protein-protein association, etc. The free energy 
gives information about both the rate at which these pro- 
cesses occur and the mechanism by which they occur. 
This makes free energy calculations a central issue in bio- 
physics. Molecular dynamics (MD) simulations provide 
a tool for performing such calculations on a computer in 
a way which is potentially both precise and inexpensive 
(e.g. [UHIEI). Since a free energy is in essence the loga- 
rithm of a probability density function (see ([TJ below for 
a precise definition) it can in principle be calculated by 
histogram methods based on the binning of an MD tra- 
jectory. This direct approach, however, turns out to be 
unpractical in general because the time scale required for 
the trajectory to explore all the relevant regions of con- 
figuration space is prohibitively long. Probably the best 
known and most widely used technique to get around 
this difficulty is the weighted histogram analysis method 
(WHAM) PQ. Following [5J, WHAM adds artificial bi- 
asing potentials to maintain the MD system in certain 
umbrella sampling windows. WHAM then recombines in 
an optimal way the histograms from all the biased simu- 
lations to compute the free energy. WHAM is much more 
efficient than the direct sampling approach, and gener- 
alizations such as [5] alleviate somewhat the problem of 
where to put the umbrella windows (usually, this requires 
some a priori knowledge of the free energy landscape). 
In practice, however, WHAM remains computationally 
demanding and it only works to compute the free energy 
in 2 or 3 variables. An interesting alternative to WHAM 
is metadynamics [TJ |H] . In essence metadynamics is a way 
to use an MD trajectory to place inverted umbrella sam- 
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pling windows on-the-fly and use these windows both to 
bias the MD simulation and as histogram bins to sample 
the free energy directly (thereby bypassing the need of 
further histogram analysis in each window). 

Both WHAM and metadynamics compute the free en- 
ergy directly by histogram methods, but an alternative 
approach is possible. Unlike the free energy which is a 
global quantity, its negative gradient (known as the mean 
force) can be expressed in terms of a local expectation 
and thereby computed at a given point in the free energy 
landscape. This is the essence of the blue moon sam- 
pling strategy [9] and it offers the possibility to calculate 
first the mean force at a given set of locations, then use 
this information to reconstruct the free energy globally. 
In one dimension, this approach is known as thermody- 
namic integration and it goes back to Kirkwood |10j . In 
higher dimensions, however, this way to compute free en- 
ergies has been impeded by two issues. The first is where 
to place the points at which to compute the mean force, 
and the second is how to reconstruct the free energy from 
these data 

In this paper, we propose a method, termed single- 
sweep method, which addresses both of these issues in 
two complementary but independent steps. In a first 
step, we use the temperature-accelerated molecular dy- 
namics (TAMD) proposed in [TT] (see also [T2"l IT3]) to 
quickly sweep through the important regions of the free 
energy landscape and identify points in these regions 
where to compute the mean force. In the second step we 
then reconstruct the free energy globally from the mean 
force by representing the free energy using radial-basis 
functions, and adjusting the parameters in this represen- 
tation via minimization of an objective function. 

The single-sweep method is easy to use and implement, 
does not require a priori knowledge of the free energy 
landscape, and can be applied to map free energies in 
several variables (up to four, as demonstrated here, and 
probably more). The single-sweep method is also very ef- 
ficient, especially since the mean force calculations can be 
performed using independent calculations on distributed 
processors (i.e. using grid computing facilities [T4| IT5j). 

The reminder of this paper is organized as follows. In 
Sec. [TT] we describe the two steps of the single-sweep 
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method in detail, starting with the second one for conve- 
nience. In Sec. Ill we illustrate the method on a simple 



two-dimensional example. This example is then used for 
comparison with metadynamics in Sec. |IV| In Sec. |V| we 
use the single-sweep method to compute the free energy 
of alanine dipeptide (AD) in solution in two and in four 
of its dihedral angles. Finally, concluding remarks are 
made in Sec. I VII and the details of the MD calculation on 
AD are given in Appendix [A] 



II. THE SINGLE-SWEEP METHOD 

A. Free energy representation and reconstruction 

We shall consider a molecular system with n degrees 
of freedom whose position in configuration space O C W 1 
will be denoted by x. We also introduce a set of N collec- 
tive variables 0(x) = (9i(x), . . . , 0n(x)) which are func- 
tions of x such as torsion angles, interatomic distances, 
etc. If V(x) denotes the potential energy of the system 
and 1/(3 its temperature, the free energy A(z) in the 
variables 9(x) is defined as 



A(z)=-pT x log / e 



,-0V{x) 



6(G(x)-z)dx (1) 



so that e~P AI - z > is, up to a proportionality constant, the 
probability density function (PDF) of the variables 9(x). 

As mentioned in the introduction, the negative gradi- 
ent of the free energy, f(z) = — V z A(z), is known as the 
mean force, and it can be computed locally at point z 
via calculation of an expectation (see (111 below). In 
this section, we shall suppose that we have obtained an 
estimate of f(z) at points zt,..., Zk, and we focus on 
the reconstruction of the free energy A(z) from these 
data. A specific way to pick these points and compute 
fi ~ f(zi), ■ ■ ■ , Sk ~ f(z K ) will be given in Sec. |IIB| but 
it is worth pointing out that the reconstruction method 
proposed here works with data set collected in any other 
ways. 

Our reconstruction method uses a radial-basis function 
representation for the free energy A(z) with centers at 
Zi,...,Zk QUE]: 



K 



A(z) = a>k<po{\z -z k \) + C. 



(2) 



k=\ 



Here C is a constant used to adjust the overall height of 
A(z) but is otherwise irrelevant, | - 1 denotes the Euclidean 
and <p a (u) = ip{u/a) where <p(u) is a radial- 



norm m 



pN 



basis function; a convenient choice is to use the Gaussian 
packet 



(3) 



though other radial-basis functions (multiquadric, 
Sobolev splines, Wendland, etc. [TB]) can be used as well, 



see Sec. [V] In ^ the heights a k and the radial-basis 
function width a > are adjustable parameters which 
we determine by minimizing over a k and a the following 
objective function, which measures the discrepancy be- 
tween the negative gradient of the function A(z) in &2i 
at the centers z k , V z A(z k ) = X, fe , =1 a k *\7 zV<7 (\z k ~ z k <\). 
and the mean force f k estimated at these centers: 



K 

E 

fc=i 



K 

E 

k' = l 



ak'VzipvQzh - Zfc'l) + fk 



(4) 



Before explaining how we perform this minimization, let 
us give several reasons why the radial-basis representa- 
tion |2| for A(z) is natural and convenient. First, the 
centers z k in ([2| do not have to lie on a regular grid, 
which permits to use mean force data collected anywhere. 
Second, the representation Q can be used in any dimen- 
sion. Third, this representation has very good conver- 
gence properties, i.e. a small number of centers gives an 
accurate representation of A(z). In fact, unlike standard 
representations based e.g. on linear interpolation on a 
regular grid, the rate of convergence in K of the repre- 
sentation in ([2]) can be made independent of N (a feature 
which the radial-basis representations share with sparse 

grids nuns). 

Going back to the minimization of E(a,a), it can be 
performed as follows. For fixed a, the a* k minimizing Q 
solve the following linear algebraic system 



K 

E 



B kik ,(a)a* k ,(a) = c k (a) 



(5) 



where B kjk i(a) and c k (a) are given by 



B k , k >(a) 



Cfc(cr) 



K 

E 

k" = l 



^z«£<y{\ z k ~ Zk"\) ■ ^z<Pcr{\z k » ~ Zfc'l), 



K 

E 

k' = l 



Vz¥>a(\z k ~ Z k '\) ■ fk' 



Given the centers z k and the estimates f k of the mean 
force at these centers, the coefficients B k k > and c k can 
be easily computed, and the linear system ([5| can be 
solved by any standard technique, e.g. Gaussian elim- 
ination. Once the solution a* k = a* k {<j) of (pil is deter- 
mined, to find the optimal a* satisfying E(cT((t*), a*) — 
min CT E(a*(a), a) we compute the residual E(a*(a),a) 
for increasing values of a starting from the distance be- 
tween the centers. More sophisticated procedures could 
be used to minimize E(a*(a),a) over a, but the brute 
force method that we used proved to be efficient enough 
because computing successive solutions of ^ for vari- 
ous a is very fast. To measure the error in the approxi- 
mation, we used the residual per center defined as 



e 2 (a) = E 1 / 2 (a*(a) 1 a)/K : 



(7) 



which reaches its minimum value at the same a* as 
E(a*(a),a). 
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Overall, the procedure is simple and inexpensive since 
the determination of a k at fixed <r is computationally 
straightforward and cheap, and can be easily repeated to 
perform the one-dimensional minimization over a. One 
caveat that we should mention, however, is that the con- 
dition number of the matrix (or) increases rapidly 
when the number of centers and/or a increase. This is 
a known problem of radial-basis functions |17j . To avoid 
any problems, we capped the admissible condition num- 
ber at 10 12 and, in situations where this threshold value 
was reached while ^{cr) was still decreasing, picked for 
cr* the corresponding value of a. These situations only 
occurred in the two-dimensional example (see Sec. Ill) 
when a lot of centers were used (500 or more, i.e. much 
more than what will be used in the AD example), and 
even in these cases, such a large condition number did 
not lead to any noticeable loss of accuracy in the results 
(even though the coefficients a k were then very large). 
We also observed that, given a number of centers z k and 
a value of a, the condition number is typically lower when 
the dimension of z is larger. Finally, we observed that 
the condition number was much lower with the Wend- 
land radial-basis function (see ( fl6| in Sec. [V]) than with 
the Gaussian radial-basis function 



B. TAMD for sweeping 

It remains to explain how to identify the centers 
Zi,.. .zk and estimate the mean force at these points. 
Following [11], we will do so using the extended system 



N 



Mx = -V x V(x) - k 2J(0 Q (aO _ z a )V x 8 a (x) 
+ thermostat terms at (3 



7 z = k(8{x) -z) + \J 27/3- 1 q(t) 

(8) 

where M is the mass matrix, rj(t) is a white-noise, 
i.e. a Gaussian process with mean and covariance 
(rj a (t)r] a i(t')) — S aa 'S(t — t'), and n > 0, the friction 
coefficient 7 > and the artificial inverse temperature 
1/(3 (^ 1/(3) are parameters whose role we explain now. 

The system in Q describes the motion of x and z over 
the extended potential 



U k (x,z) = V(x) + ±k\0(x)-z\ 



(9) 



As shown in |llj . by adjusting the parameter k so that 
z(t) w 0(x(t)) and the friction coefficient 7 so that z 
moves slower than x 7 one can generate a trajectory z(t) 
in z-space which effectively moves at the artificial tem- 
perature 1 /(3 on the free energy computed at the physical 
temperature 1/(3. By taking 1/(3 > 1/(3, the z(t) trajec- 
tory visits rapidly the regions where the free energy is 
relatively low (i.e. within a range of a few 1/(3) even if 
these regions are separated by barriers which the system 



would take a long time to cross at the physical tempera- 
ture 1/(3. This gives us a way to determine automatically 
where are the relevant regions in free energy space. 

In [TT], the extended system in ^ was proposed to 
sample the free energy landscape directly. Here, we make 
a different use of Q: we utilize the trajectory z(t) to 
rapidly sweep through z-space and generate the centers 
Zk used in the radial-basis representation Q . 
Specifically, we start from z(0) = z\, then deposit a new 
center z k along z(t) each time z(t) reaches a point which 
is more than a prescribed distance d away from all the 
previous centers, where d is a parameter controlling the 
density of the covering by the centers (the smaller d, the 
higher the number of centers deposited). At the same 
time, at each of these centers Zk, we launch a simulation 
of ^ with z(t) = Zk fixed, i.e we use 



N 



Mx 



= -V x V(x) - K^2(9 a (x) - z k , a )V x 9 a (x) ,j 



(10) 



thermostat terms at (3 



and compute: 



fk = ^J R(z k -6(x(t)))dt. 



(11) 



The calculations of these time averages are independent 
of each other, and hence they can be distributed, using 
(ideally) at least one processor per center z k , an approach 
that optimally fits with the purposes of grid comput- 
ing |T4j [15] . The estimator in (11) has the advantage of 



being simple, but it introduces an error due to the finitc- 
ness of R. This error can be decreased by using R in ( 10 1 
and (111 larger than k in (I8J) , or even eliminated by using 



constrained instead of restrained simulations and using 
the blue- moon estimator for the mean force [201 ET] , 

Once the centers Zi, . . . , z k have been deposited and 
the estimates fi,. .■ ,fx of the mean force at these cen- 
ters have been obtained, we use the reconstruction pro- 
cedure explained in Sec jll A| and compute the optimal 
set of coefficients a\ and the optimal a* to use in the 
representation ^ for A(z). 

We conclude this section by stressing that using the 
extended dynamics ([8| to sweep through z-space and de- 
posit the centers z k is very different than using it to 
sample A(z) directly, which makes our approach very 
different from WHAM or metadynamics. Unlike with 
sampling, revisiting twice a region in z-space is unnec- 
essary and even undesirable since no new center will be 
deposited. The accuracy of the reconstruction depends 
on the number of centers and the accuracy at which the 
mean force is computed in (111 much more than the pre- 



cise locations where the centers are deposited. An impor- 
tant practical consequence is that it is rather straightfor- 
ward to pick the parameters k and 7 in ^ since the final 
result is robust against variations in these parameters. 



X 



a 



FIG. 1: A trajectory generated by simulating ( 12 \ by forward 
Euler with a time-step At — 2 ■ 10~ 5 for 2 ■ 1(T steps (white 
curve) shown above the contour plot of the Mueller potential 
(with 29 level sets evenly distributed between V = and V = 
180 in a scale where the minimum of the potential is V — 0). 
The red circles are the locations of the centers deposited along 
the trajectory using d — 0.175. In this run, 174 centers were 
deposited. 



III. TWO-DIMENSIONAL ILLUSTRATIVE 
EXAMPLE 



FIG. 2: Residual per center e2(u) defined in Q for the re- 
construction of the Mueller potential with the 2 • 10 4 steps 
single-sweep trajectory shown in Fig. [T] The optimal a for 
this run was a* = 0.398. 



original Mueller potential, while Fig. [4] compares the val- 
ues of the original and reconstructed Mueller potential 
along the TAMD trajectory shown in Fig. [I] As a simple 
estimate of the error, we used: 



Since, given the location of the centers z&, the mean 
force estimation at these points is quite standard, as a 
first illustration we use a two-dimensional example for 
which 6(x) = x = (x,y) and A(z) = V(x) where V(x) 
is the Mueller potential )22j . In this case, there is no 
need to extend the system as in (JsJ) , and the temperature 
accelerated dynamics simply reduces to (setting 7 = 1 by 
appropriate rescaling of time) 



ei 



x = -VV{x) + v / 2/3- 1 ri(t) . 



(12) 



Fig. 1~| shows a TAMD trajectory generated by solv- 
ing (12 1 by forward Euler with the initial condition 
(x(0),y(0)) = (1,0) and a time-step of At = 2 • 1(T 5 
for 2 • 10 4 time-steps at 1/(3 = 40 (for comparison the en- 
ergy barrier between the two main minima of the Mueller 
potential is about 100). Also shown are the centers 
Zk = {xk,Dk) obtained by depositing a new center along 
the trajectory each time the trajectory reaches a point 
which is d = 0.175 away for all the previous centers. In 
this run, 174 centers were deposited. At the centers, we 
used — VV(xk,yk) = fk as estimate of the "mean force" 
(i.e. there is no sampling error in the present example). 
We then used these data to reconstruct the free energy 
as explained in Sec. |IIA| Fig. [2] shows the residual per 
center e2(cr) defined in ([7j). The optimal a for this run 
was a* = 0.398 and the condition number at this a* 
was 7 • 10 6 . The level sets of the reconstructed poten- 
tial are shown in Fig. [3] and compared to those of the 



Sg\V{x)-V{x)\dx 
J n \V(x)\dx 



(13) 



where V(x) denotes the reconstructed potential and f2 is 
the domain in which the original potential remains less 
th an 1 80 above its minimum value. The error defined 
in (13 1 for this calculation was e\ = 4.2 • 10~ 3 . 



These results, which are already very good, can be im- 
proved by diminishing d and thereby increasing the num- 
ber of centers without having to increase the length of the 
TAMD trajectory. For example, by taking d — 0.12, we 
obtained 351 centers in a trajectory still 2-10 4 steps long. 
Using these centers to reconstruct the Mueller potential, 
we obtained e\ = 3.2 • 10~ 4 . The level sets of the re- 
constructed and original potential defined in Fig. [3] now 
overimposed so perfectly that they could not be distin- 
guished on the scale of Fig. [3] (data not shown). The 
optimal a for this calculation was a* = 0.362, at which 
value the residual was (a*) = 4.7 • 10" 3 and the condi- 
tion number was 7 • 10 11 . 

Finally, we note that the result can also by improved 
by keeping the same distance d = 0.12 between centers 
but increasing the length of the TAMD trajectory. For 
instance, increasing the number of steps to 5 • 10 4 pro- 
duced a reconstruction with an error ei = 7.1 • 10 -5 (data 
not shown). 
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FIG. 3: Comparison between the level sets of the original 
Mueller potential (red curves) and the reconstructed potential 
using |2| (black curve). Here we use the 174 centers shown in 
Fig. [T] The optimal a is a* = 0.398, see Fig. [5] We show 29 
level sets evenly distributed between V = and V = 180. The 
level sets of the reconstructed potential and the original one 
are in so close agreement that they can only be distinguished 
in some localized regions (e.g. near the saddle point between 
the two minima in the lower right corner). 



IV. COMPARISON WITH METADYNAMICS 

Because metadynamics [3 [8] also uses an extended dy- 
namical system for x and z and the Gaussian packet ^ 
to represent A(z), the single-sweep method bears similar- 
ities with it. Yet, there is an essential difference between 
the two methods. Unlike the single-sweep method, meta- 
dynamics does not use the mean force, and estimates 
A(z) by direct sampling, which turns out to be a less 
efficient way to proceed. Let us elaborate on this claim. 

Recall that metadynamics uses an extended system 
like ^ but where the equation for z is replaced by [23] 



7 i = k(8(x) -z) + ^2 1 p- l r 1 (t) 

ft (14) 
+ v V ,<p a {\z(t) - z(t')\)dt' . 
Jo 

Here 1 / (3 is now the physical temperature of the system, 
and k and 7 are parameters playing the same role as 
in ^ . The integral term in ( [T^ is a flooding term (with 
v > controlling the flooding rate) which deposits Gaus- 
sian packets <p a (u) on the energy landscape wherever z(t) 
goes, thereby progressively leveling the effective free en- 
ergy landscape felt by z(t). The negative of the integral 
of the Gaussian packets deposited then gives an approx- 
imation of the free energy. For a trajectory with -/V max 
time-steps of size At, the time-discretized approximation 
of this integral reads (compare (pj) 

iV max 

A(z) = vAt J2 <p*(\z - z(nAt)\) + C (15) 



FIG. 4: Comparison between the original (blue line) and re- 
constructed Mueller potential (red line) computed along the 
2 ■ 10 4 steps single-sweep trajectory shown in Fig. [I] The 
green line shows the absolute value of the difference between 
the two. 



where C is a constant used to adjust the height of A(z). 

Despite the fact that (151 uses Gaussian packets which 
are radial-basis functions, the representation (151 is 



very different from the standard radial-basis representa- 
tion ([2]) used in the single-sweep method. In particular, 
there are no coefficients to adjust in ( 15 ). This has the 



important consequence that, instead of requiring a single 
sweep across 2-space to get an accurate estimate of the 
free energy, metadynamics requires that the trajectory 
revisits many times the same locations in z-space to de- 
posit centers (i.e. iV max in (151 must be much larger than 
K in ([2]) to achieve the same accuracy). This is because 
the leveling out achieved by the integral term in (14 1 
and, hence, the convergence of the representation (15 1, 
only occur statistically [2H (in contrast, the mean 
force data used at each center in the single sweep method 
contains already all the statistical information needed at 
that center). This is consistent with metadynamics be- 
ing in essence an histogram method, albeit one where the 
histogram windows are adjusted on-thc-fly. 

What this entails in terms of efficiency can be illus- 
trated on the two-dimensional Mueller example consid- 
ered before, fn this example, to generate the metadynam- 



ics trajectory we used ( 12 1 with flooding terms added as 
;h what was done in Ref. [25" 



in (U4h, consistent wit 



to 

test the efficiency of metadynamics in a similar set-up. 
Fig. [5] show s th e metadynamics trajectory obtained by 
integrating (14) for 2 • 10 4 timesteps with a time-step of 
At = 2 ■ 10 "(same as in the single-sweep method for 
the results shown in Figs, [l] to [2|. As can be seen in 
Fig. [5] this number of timesteps was enough for the tra- 
jectory to visit the important regions of the potential. 
The reconstructed free energy from this calculation is 
compared in Fig. [6]to the original one. The metadynam- 
ics result is also compared in Fig. [7J to the one obtained 



FIG. 5: Metadynamics trajectory (white line) with 2 • 10 4 
steps overimposed on the original Mueller potential. 



by the single-sweep method with 174 centers. The er- 
ror ( 13 1 for this metadynamics calculation was e\ = 0.16, 
i.e. almost two orders of magnitude higher than with 



the single-sweep method. Fig. [8] shows the original and 
reconstructed Mueller potential along the metadynam- 
ics trajectory (blue and red lines, respectively), together 
with the absolute value of their difference (green line). 
By comparing Figs. [4] and [8] it can be seen that the 
discrepancy between the original and reconstructed po- 
tential is much larger with metadynamics than with the 
single-sweep method on trajectories of the same length. 
Note that these results clearly indicates that it is not 
sufficient that the metadynamics trajectory visits once a 
region on phase space to get an accurate representation 
of the free energy in this region. This was already noted 
in Refs. [HHS]. 

We were able to improve the metadynamics result by 
extending the simulation to 2 • 10 5 steps. The covering 
of the important regions in the potential was now exten- 
sive (data not shown), and the reconstructed potential 
(data not shown) looked visually better than the one ob- 
tained with the shorter trajectory. Yet the error ( 13 ) 
was ei = 0.12, i.e. still two orders of magnitude larger 
than the highest error we obtained with the single-sweep 
method using a 10 times shorter trajectory. We did not 
attempt to go to longer runs with metadynamics because 



the memory term in ( 14 1 makes such simulations increas- 



ingly expensive (their cost scales as the square of the 
number of timesteps). It should also be stressed that in 
all these calculations, we optimized the parameters v, a 
and 1/(3 the best we could. This optimization, however, 
turns out to be complicated since there is no systematic 
way to perform it because, unlike with the single-sweep 
method, there is no objective function to minimize in 
metadynamics. The results shown in Figs. p>H8]were ob- 



FIG. 6: Comparison between the level sets of the original 
Mueller potential (red curves) and the reconstructed potential 
using metadynamics with a trajectory of 2 ■ 10 4 steps with a 
time-step of At — 2 - 10 -5 (same as in Figs [l]and[3]). We only 
use 10 level sets evenly distributed between V = and V = 
180 because the differences between the maps are much bigger 
than with the single-sweep method and drawing more level 
sets makes the figure difficult to read. (There are only seven 
black level sets in the energy reconstructed by metadynamics 
because it levels off around V = 140.) 



tioning that the simplicity of the Mueller potential ex- 
ample tends to exaggerate the gain that the single sweep 
method provides over metadynamics. Indeed, in realistic 
situations, the single-sweep method also requires to com- 
pute the mean force via (11), an operation which was 



tained with v = 2 • 10 3 , a = 0.2 and 1//3 = 10. 

To be fair, we should conclude this comparison by men- 



unnecessary in the Mueller example since the force was 
readily available. Computing the mean force adds an ex- 
tra cost to the method. It is worth stressing again, how- 
ever, that the computation of the time averages in (111 
can be distributed over several processors. This means 
that in the ideal situation where the user has at least one 
processor per center, the effective time to compute all of 
the mean forces is the same as the one for computing a 
single one of these forces, i.e. we have perfect scalability. 
Metadynamics can be parallelized per replica as well, as 
was proposed in Refs. [261 EH EE] , but not as straight- 
forwardly and not with perfect scalability. Indeed, in all 
of these versions of metadynamics, the simulated replica 
are never completely independent from each other. 



V. FREE ENERGY OF ALANINE DIPEPTIDE 
IN SOLUTION 

In this section, we use the single-sweep method to re- 
construct the free energy of the solvatcd alanine dipep- 
tide (AD) molecule in two and four torsion angles at 
300 K. While AD is not an example of biochemical in- 
terest per se, we study it because it has been extensively 
used as a benchmark example for free energy calculations 
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FIG. 7: Comparison between the Mueller potential as recon- 
structed by the single-sweep method (upper panel) and meta- 
dynamics (lower panel) both with a trajectory of 2 • 10 4 steps 
and a time-step of At = 2 • 10 -5 . Other representation of 
these contourplots were already shown in Figs. [3] and [^respec- 
tively. The map reconstructed by the single-sweep method 
is very close to the map of the original Mueller potential. 
The colormaps used in both panels are the same and the re- 
constructed potentials are shifted so that their minimum is 
V = 0; the white region in the left panel is where the energy 
is above 180 and is not shown (the result of metadynamics 
shown in the right panel levels off around V = 140 which is 
why there is no white). 



in the literature [BJ (251 ISO]- On top of this the system 
is simple enough that we can use it to systematically in- 
vestigate how the accuracy of the reconstruction method 
depends on the number of centers and how robust the 
method is with respect to statistical errors in the input 
data for the mean forces. Another question we investigate 
in this section is the robustness of the method against the 
choice of radial-basis functions. Specifically, we compare 
results obtained using the Gaussian packet pi and the 
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FIG. 8: Comparison of the original (blue line) and recon- 
structed Mueller potential (red line) computed along the 2-10 4 
steps metadynamics shown in Fig. [5] The green line shows 
the absolute value of the difference between the two. The 
green line here should be compared with the one in Fig. [4] for 
the single-sweep method: the discrepancy between the orig- 
inal and reconstructed potential is always larger with meta- 
dynamics than with the single-sweep method. 

Wendland function 

tp(u) = (1 -w) + (35u 2 + 18u + 3) (16) 

where (/(«)]+ = /(«) if f(u) > 0, and (/(«))+ = 
otherwise. (i"6| is another well-known example of radial- 
basis function which has the pleasant property that it is 
compactly supported. This property is appealing in the 
calculations since it limits the range over which centers 
interact in Q. 

All MD simulations reported below were performed 
with a version of the MOIL code |3T] suitably modified 
by us, and the AMBER/OPLS gj force field (for details 
of the MD set-up see Appendix [AJ . 

A. Two angles calculation 

We use the standard dihedral angles <f> and ip. 
At 300 K, the system is confined in a region of 
the (<j), ip) space with <p < —50° by energy barriers 
higher than 1/(3. In order to overcome these barri- 
ers and sweep through the whole [— 180°, 180°] 2 space, 
we generated a trajectory by using (|8| with (z 1; z 2 ) = 
((/), ijj), k = 100 kcal/mol/rad 2 , a friction coefficient 
7 = 0.5 kcalxps/mol/rad 2 and an artificial temperature 
1//3 = 9.5 kcal/mol. With this choice of the parameters, 
the important regions of the [— 180°, 180°] 2 space were 
visited in 4 • 10 4 steps (40 ps in the time units of the 
MD variables). The time series of </> and ip along this 
trajectory are shown in Fig. [9j Variations in k, 7 and 
1//3 led to qualitatively similar TAMD trajectories, in- 
dicating that the the method is robust with regard to 




20 
time (ps) 

FIG. 9: Time series of the dihedral angles (f> and tp along the 
40 ps long TAMD trajectory for the solvated alanine dipeptide 
(AD). 



the choice of these parameters. Sets with a different 
number K of centers were deposited along the TAMD 
trajectory afterwards by processing this trajectory using 
various distances d between the centers. Specifically, we 
generated sets of 90, 128, 151, 188, 219 and 262 centers 
using, respectively, d = 31.77°, 26.00°, 23.87°, 21.37°, 
20.00°, 17.92°. 

Given a set of K centers (4>k,ipk), we computed the 
mean forces via K independent MD simulations with re- 
straints at {4>,ip) = {4 > k^k)i i-e. by simulating (10 1 in 



the isokin etic ensemble at 300 K, and estimated the mean 
force via (111 with R = 100 kcal/mol/rad 2 . This value 



of R was high enough since we checked that the recon- 
structed free energy remained invariant with higher val- 
ues of R (we did so up to R = 10 3 kcal/mol/rad 2 ). We 
then used this data in the reconstruction procedure ex- 
plained in Sec. |II A| Note that since the free energy in 
the {((>, ip) angle is periodic, we have to periodically ex- 
tend the centers for the representation. This amounts to 
changing the representation in Q into 



K 

A(z)= ^2a k ip a (\z - z k + 2ti h-e\) + C, (17) 



■hez N k=i 



where e is the unit vector in R*. In practice, only few 
periodic replica of the centers are needed (i.e. hi — 2 for 
i = 1 , . . . , N) because the radial-basis functions centered 
at the centers further away from the cell under consider- 
ation make negligible contributions to the result in this 
cell. 
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FIG. 10: Free energy of AD in the <f> and ip dihedral an- 
gles at 300 K calculated with the single-sweep method by us- 
ing 188 centers deposited at a distance of d = 21.37° from 
each other. Units for the free energy are kcal/mol, and 
contour levels are plotted at 0.5 kcal/mol, 1 kcal/mol, and 
then every 1 kcal/mol. The optimal a in this reconstruction 
was a — 44.73°. The centers are represented as white cir- 
cles. At every center, the corresponding mean force vec tor is 
also shown. Mean forces were calculated by using (111 with 
R = 100 kcal/mol/rad 2 and T — 50 ps. 



1. Calculation with d — 21.37° (188 centers) and 
T — 50 ps. 

We first detail our result with this choice of param- 
eters to pick an example which led to a good balance 
between accuracy and efficiency. Other choices of param- 
eters are discussed below. 



Thus, Fig. 10 shows the recon- 
structed free energy map obtained with d = 21.37° (188 
centers) and by computing the mean forces from (111 
with T = 50 ps. The optimal a in this calculation was 
a* = 44.73°. In the figure, the minimum of the free en- 
ergy is set at kcal/mol, and contour levels are plotted at 
0.5 kcal/mol, 1 kcal/mol and then every 1 kcal/mol. The 
centers are represented as white circles, and the mean 
forces at the centers as arrows. 

Since the free energy map depends on the force field 
used, comparison with results in the literature is difficult. 
To assess the accuracy of our result self-consistently, we 
compared it with the free energy calculated by comput- 
ing the PDF of <f> and ijj from a direct MD simulation 
(DMDS) of about 30 ns. While this trajectory does not 
cover all the [—180°, 180°] 2 space, it covers the important 
regions and allows for an unbiased estimation of the free 
energy in these regions which can be used as benchmark. 
The left panel in Fig. [TT] shows the contour levels of the 
free energy from the single-sweep (black lines) and that 
from DMDS (red lines). The contour levels are plotted 
at 0.1 kcal/mol (dotted lines), every 0.5 kcal/mol from 
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FIG. 11: Comparison between the free energy obtained by 
single-sweep (black lines) method and DMDS (red lines) for 
AD. The left panel shows the result with with 188 centers, 
the right panel the one with 262 centers. The contour levels 
of the free energy are plotted at 0.1 kcal/mol (dotted lines), 
from 0.5 to 4.0 kcal/mol separated by 0.5 kcal/mol, and then 
separated by 2 kcal/mol. 



0.5 to 4 kcal/mol, and then every 2 kcal/mol. As can 
be seen, single-sweep results agree remarkably well with 
those of the DMDS. 
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In terms of cost, to generate the result shown in Fig 
we had to make one simulation run of 40 ps to gener- 
ate the TAMD trajectory, plus 188 independent runs of 
50 ps distributed on different nodes (the additional cost 
of estimating the parameters a* k and a* to use in ^ is 
insignificant). This makes for a total of 9.4 ns of absolute 
simulation time. However, after distribution, the effec- 
tive simulation time needed is only 90 ps. On top of this, 
we show below that a good estimate of the free energy 
can be obtained with as low as 90 centers (i.e. with an 
absolute simulation time of 4.5 ns and the same effec- 
tive simulation time, 90 ps). For comparison, in Ref. [30] 
Ensing et al. report a 4 ns calculation performed with 
metadynamics to estimate the free energy of AD in <p and 
ijj. It is not clear how well this metadynamics calculation 
can be parallelized to reduce its effective cost (it was not 
parallelized in Ref. [30 ). In addition, the result of this 
metadynamics calculation is unlikely to be as accurate 
as the one in Fig. [To] (in Ref. jSUj no comparison like the 
one shown in Fig. 11 is provided) 



2. Robustness and convergence analysis 

Next we analyze how robust are the results with re- 
spect to the statistical error in the mean force data and 
the choice of radial-basis function. We also analyze con- 
vergence in function of the number of centers. As ref- 




FIG. 12: Difference maps with respect to the AD free energy 
in <j> and ip reconstructed with Gaussian functions using 262 
centers and T = 250 ps. The figures in the different panels 
correspond to various number of centers and length of time av- 
eraging for the mean forces, as indicated. Units are kcal/mol. 
Note that the scale of the colormap is different from the one 
in Fig. |10| In particular, the differences are mostly below 
0.5 kcal/mol with 151 centers and 50 ps simulations already. 



erence value, we take the free energy reconstructed with 
d = 17.92° (262 centers) and T = 250 ps of time av- 
eraging in ( pTj ). The map of the free energy calculated 
with these parameters (data not shown) is visually very 
similar to the one shown in Fig. |10[ but it is more ac- 
curate. The residual error can be estimated from the 
right panel of Fig. [TT] which shows the contour levels of 
the free energy from the single-sweep (black lines) and 
that from DMDS (red lines): these level sets coincide 
up to statistical errors in the DMDS, indicating that the 
free energy provided by the single sweep method with 
262 centers and T = 250 ps can indeed be taken as an 
"exact" benchmark. 

Fig. [12] shows the differences between the map of the 
reference free energy reconstructed with d — 17.92° (262 
centers) and T = 250 ps and those reconstructed with less 
centers and shorter restrained simulations. The largest 
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FIG. 13: Residual per center versus number of centers in the 
reconstruction of the AD free energy in <f> and ip angles. Data 
are from calculations with different time averaging length, 
using Gaussian (G) and Wendland (W) basis functions. 




FIG. 14: Free energy of AD in the <p, ip, 9 angles obtained from 
the marginal in these angles of the PDF associated with the 
free energy in four angles A(<j),ip,0,(). Data are represented 
for 8 £ [—50°,— 3°]. Note that the scale of the colormap is 
different from the one in Fig. [To] 



B. Four angles calculation 



errors are in the regions corresponding to the highest 
peaks of the free energy (these are also the regions were 
the least centers were deposited). The differences never 
exceed 1.25 kcal/mol with 90 centers and T = 50 ps and 
they fall mostly below 0.5 kcal/mol with 151 centers and 
T = 50 ps already. 

We also compared the quality of the reconstruction 
of the free energy when using Gaussian ([3| and Wend- 
land (16 1 basis functions. Fig. 13 shows the residual per 
center versus the number of centers for AD, by using 
Gaussian (filled symbols) and Wendland (empty sym- 
bols) basis functions, using T = 50 ps (diamonds) and 
T = 250 ps (circles) long restrained simulations to esti- 
mate the mean forces. At equal values of K and T, the 
reconstruction is slightly more accurate with Gaussian 
than with Wendland functions, though these differences 
turn out to be quite small in terms of the free energy 
maps themseves (data not shown) . 

Using longer simulations for the mean force (which 
means a smaller random error on these forces) also im- 
proves the results. With 262 centers and T — 250 ps long 
simulations for the mean forces, the maps reconstructed 
with Gaussian and Wendland basis functions were almost 
identical (data not shown), and they were not signifi- 
cantly different from the map shown in Fig. 
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results, however, were obtained at very different values of 
optimal a: a* = 26.33° with Gaussians and a* — 123.20° 
with Wendland functions. The condition numbers at the 
optimal a were 3889.67 and 868.86, respectively, which 
is low. Note that the same trends here described were 
observed in a periodic test case for which the potential 
was known exactly (data not shown) . 



As a second more challenging test, we computed 
the free energy of AD in the four torsion angles, 
c6, ip, 9, and C- A TAMD trajectory of 44 ps 
was generated by using ^ with (zi, Z2, Z3, Z4) = 
(<f>, ip, 9,C), k — 100 kcal/mol/rad 2 , an artificial tem- 
perature 1/(3 — 9.5 kcal/mol, friction coefficients 
7 = 0.5 kcalxps/mol/rad 2 for <p and ip and 7 = 
1 kcalxps/mol/rad 2 for 9 and £. The MD potential keeps 
the amide planes in trans configuration, and so 9 and 
C were varying in the range [—70°, 70°]. In 44 ps, the 
TAMD trajectory covered well the accessible state space 
for <fi, ip, 9 and £, in the sense that the time series for 
these angles were similar in their respective state space 
to those shown in Fig. [9] (notice however that extensive 
coverage of the four-dimensional space is unlikely in so 
short a run). Along the TAMD trajectory, 200 centers 
at a distance of d = 45.84° were deposited. At these 



centers, the mean forces fk were computed by using (111 
with k — 100 kcal/mol/rad 2 and T — 50 ps (i.e. the abso- 
lute time of simulation was about 10 ns, but the effective 
time after distribution was 94 ps only) . We used these fk 
in the objective function Q to finally get the representa- 
tion ([2]) of the four-dimensional free energy A{<p, ip, 9, £). 
The optimal a in this representation was a* — 67.63° 
and the condition number at this value of a was 4214.05. 

Since a full graphical representation of A(jp, ip, 9, Q 
is not possible, we did several tests to validate our re- 
sult. Fig. [14] shows the three dimensional free energy 
A(<p, ip, 9) obtained from the marginal in these angles of 
the PDF associated with A(<p, ip,9,Q. This marginal was 
calculated a posteriori by numerical integration over £ of 
e -/3A(0,0,e,c) with the f uU A(4>,ip,6,C) reconstructed by 
the single-sweep method. The map is reasonable, and 
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FIG. 15: Free energy of AD in the <f>, ip angles obtained from 
the marginal in these angles of the probability density associ- 
ated with the free energy in four angles A(<f>, tp, 0, £). Contour 
levels are as in Fig. |10| Note the remarkable agreement be- 
tween this map and the one shown in Fig. |10| 



shows nontrivial features in all three directions. Fig. [15] 
shows the two dimensional free energy A(cj>, ip) obtained 
from the marginal in these angles of the PDF associated 
with A(4>, tp, 9, C)- The map is in remarkably good agree- 



ment with the one in Fig. 10 



As a further test of accuracy, we re-calculated the 
mean force using ( 10 1 and pT| using a different set of 
centers than those used in (fek. Then we estimated the 
relative error between these mean forces and the ones 
obtained by taking the negative gradient of the recon- 
structed A((j), ip, 9, () using the original set of centers and 
mean forces: 



£fc 



|V 2 I(^) + /»| 
\fk n \ 



(18) 



where z"£ are the new centers and f£ are the mean forces 
at these centers. The new centers were 20 points chosen 
at random in the domains <j>, ip e [-180°, 180°], 6, ( <E 
[-70°, 70°]. 

Fig. [16] top panel, shows, for each of these centers, the 
distance from the closest of the 200 centers (black line). 
Data are compared to the minimal distance between the 
200 centers (red dashed line). This result shows that, 
with d = 45.84°, these centers fill properly the four di- 
mensional domain in the sense that every new center is 
always a distance about d to one of the original cen- 
ters. Fig. |16| middle panel, shows the relative error 
for k — 1,...,20 (black solid line), when T = 50 ps 
long restrained simulations are used to compute the mean 
forces. The mean value of £k (black dashed line) is 0.14, 
with standard deviation 0.09 and maximum value 0.47. 
For comparison, the mean value of the relative residual 
per center (red dashed line) is 0.09, with standard devi- 
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FIG. 16: Accuracy of the reconstruction of the free energy of 
AD in four angles. Top panel, distance of each of the ran- 
dom centers from the closest of the 200 centers (black line), 
compared with the minimal distance between the 200 centers 
(red line). Middle and lower panel, relative error Ek defined 
in (|18|) for mean forces computed respectively from 50 and 



250 ps restrained simulations: the error Ek (black solid line) 
and its mean value (black dashed line), compared with the 
mean value of the relative residual per center for the 200 cen- 
ters set (red dashed line) and its maximum value (red dashed- 
dotted line). 



ation 0.06 and maximum value 0.44 (red dashed-dotted 
line). Fig. 16 bottom panel, shows when T — 250 ps 



long restrained simulations are used to compute the mean 
forces. In this case, the mean value of (black dashed 
line) is 0.12, with standard deviation 0.07 and its max- 
imum value is 0.34. For comparison, the mean value of 
the relative residual per center (red dashed line) is 0.09, 
with standard deviation 0.06 and maximum value 0.39 
(red dashed-dotted line). 

These results show that, in points away from the orig- 
inal centers, the reconstructed free energy is as accurate 
as it is at the centers, which is clearly the best we can 
hope for. 



VI. CONCLUDING REMARKS 

In summary, we have proposed a method for the cal- 
culation of free energies which is simple, accurate, and 
efficient. Unlike standard histogram methods such as 
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WHAM and metadynamics, the single-sweep method 
uses the mean force computed at a set of centers to recon- 
struct the free energy. This set of centers is determined 
using TAMD to rapidly sweep through the important re- 
gions of the free energy, and the mean forces at these 
centers are estimated in a standard way via the compu- 
tation of a conditional expectation using time-averaging 
along restrained or constrained simulations. From these 
data, the free energy A(z) is then reconstructed globally 
by minimization of an objective function to determine 
the coefficients in a radial-basis function representation 
of A(z). If convenient, this reconstruction step can use 
data for the centers and the mean forces obtained by 
other means than TAMD. 

Compared with histogram methods and metadynam- 
ics, the single-sweep technique combines several advan- 
tages: 

• It does not require a priori knowledge of the free 
energy since it uses TAMD to find the important 
regions in the landscape automatically. 

• The most costly step of the calculation, namely the 
computation of the mean forces at the centers, can 
be straightforwardly distributed on different, inde- 
pendent, processors. 

• The reconstruction step is variational, i.e. the 
optimal coefficients in the free energy representa- 
tion are determined automatically, which limits the 
number of parameters to adjust beforehand. 

• The results can be easily monitored for conver- 
gence, and systematically improved if desired. In 
particular, new centers can be added on top of pre- 
vious ones along the same TAMD trajectory to in- 
crease the accuracy without having to repeat the 
previous calculation. 

• The method can be used in more than 2 dimen- 
sions and its computational complexity is the same 
regardless of the dimension. 

We believe that these features make the single-sweep 
method appealing to calculate the free energy of systems 
more complicated but also more interesting than the ones 
studied in this paper. 
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APPENDIX A: DETAILS OF THE MD 
SIMULATIONS 



All MD simulations were performed with the MOIL 
code [31], and the AMBER/OPLS [32] force field as 
implemented in the code. A starting structure for the 
AD molecule (CH 3 -CO-NH-C Q HCH 3 -CO-NH-CH 3 ) was 
solvated in a box of 252 water molecules of volume 
(20 A) 3 . Periodic boundary conditions were used. Van 
der Waals interactions were truncated at 9 A. Electro- 
static interactions were treated with the Particle Mesh 
Ewald method [33] with real space cutoff 9 A, a grid of 
32 3 points, and 4-th order 5-splines for the interpola- 
tion of the structure factor (in order to be in the high 
accuracy range [33] )■ The TIP3 model [34 was used 
for the water molecules. Non-bonded interaction lists 
were updated every 10 steps. All chemical bonds in the 
system were kept fixed with the SHAKE algorithm [35 . 
Amide planes were restrained to be always in trans con- 
figuration. The Velocity Verlet algorithm was used for 
the dynamics of the Cartesian variables with time-step 
1 fs, and all velocities were scaled at every step to keep 
the temperature at 300 K. In order to obtain the initial 
configuration for the temperature accelerated MD sim- 
ulation (TAMD), the system was first equilibrated for 
100 ps by keeping the solute molecule fixed (i.e. by zero- 
ing forces and velocities of its atoms) and by assigning to 
all water atoms at every step velocities sampled from a 
Maxwell distribution at 300 K. Then, the whole system 
was simulated for 400 ps for equilibration. The torsion 
angles used in the simulations are defined by the quadru- 
plets of atoms (C,N,C a ,C) and (N,C Q ,C,N) for 4> and tp, 
and (0,C,N,C Q ) and (C Q ,C,N,H) for 9 and (. In the 
TAMD simulation, the equations of motion of the col- 
lective variables were integrated with the forward Eulcr 
scheme with time-step 1 fs, 7 — 0.5 kcalxps/mol/rad 2 
for (f> and ip and 7=1 kcalxps/mol/rad 2 for 9 and 
£. The force constant for the restraint potential was 
k = 100 kcal/mol/rad 2 , and the effective temperature 
such that = 9.5 kcal/mol. The Cartesian coor- 

dinates of water and AD atoms were saved during the 
TAMD simulation. In this way, for every center z^ de- 
posited along the trajectory in collective variables space, 
there is a corresponding configuration of the system 
in Cartesian space such that 9(xk) ~ z^. We used these 
configurations as initial conditions for the restrained sim- 
ulations at the centers. Data for the mean force calcu- 
lations were accumulated after further relaxation of the 
system for 5 ps. 
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